Superlinear Parallelization of k-Nearest Neighbor Retrieval

نویسندگان

Antal van den Bosch

Ko van der Sloot

چکیده

With m processors available, the k-nearest neighbor classifier can be straightforwardly parallelized with a linear speed increase of factor m. In this paper we introduce two methods that in principle are able to achieve this aim. The first method splits the test set in m parts, while the other distributes the training set over m sub-classifiers, and merges their m nearest neighbor sets with each classification. For our experiments we use TIMBL, an implementation of the k-NN classifier that uses a decision-tree structure for retrieving nearest neigbors, and that employs feature weighting. While the first method consistently scales linearly, with the second method we observe cases of both superlinear and sublinear scaling. Analysis shows that superlinear scaling can occur with datasets of which the feature weights exhibit a low variance; retrieval of nearest neighbors from the tree structure becomes exponentially slower with more data. Hence, the retrieval of classifications from m subclassifier decision structures based on 1/mth parts of the training set can be substantially more than m times faster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superlinear parallelisation of the k-nearest neighbor classifier

متن کامل

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.

متن کامل

A Parallel Algorithms on Nearest Neighbor Search

The (k-)nearest neighbor searching has very high computational costs. The algorithms presented for nearest neighbor search in high dimensional spaces have have suffered from curse of dimensionality, which affects either runtime or storage requirements of the algorithms terribly. Parallelization of nearest neighbor search is a suitable solution for decreasing the workload caused by nearest neigh...

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics. The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in which the bandwidth is varied depending on the location of the sample points. In this paper‎, we initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Superlinear Parallelization of k-Nearest Neighbor Retrieval

نویسندگان

چکیده

منابع مشابه

Superlinear parallelisation of the k-nearest neighbor classifier

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

A Parallel Algorithms on Nearest Neighbor Search

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

عنوان ژورنال:

اشتراک گذاری